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Abstract 

We develop a systematic cluster expansion for dilute systems in the highly 
dilute phase. We first apply it to the calculation of the entropy of the K- 
satisfiability problem in the satisfiable phase. We derive a series expansion 
in the control parameter, the average connectivity, that is identical to the 
one obtained by using the replica approach with a replica symmetric (rs) 
Ansatz, when the order parameter is calculated via a perturbative expansion 
in the control parameter. As a second application we compute the free-energy 
of the Viana-Bray model in the paramagnetic phase. The cluster expansion 
allows one to compute finite-size corrections in a simple manner and these 
are particularly important in optimization problems. Importantly enough, 
these calculations prove the exactness of the RS Ansatz below the percolation 
threshold and might require its revision between this and the easy-to-hard 
transition. 

LPTENS/0046, LPTHE/0109 

1 Introduction 

Very few analytical tools have been successfully employed to study disordered sys- 
tems beyond mean-field. Mainly, one can mention the functional renormalization 
group analysis Q, high-temperature expansions of finite dimensional systems ||, 
expansion in the concentration of disordered models defined on finite dimensional 
lattices || and expansions around mean-field theories [||]. The replica method has 
been used to study dilute spin-glass models 0-0 and, even if it allows one to 
obtain a number of analytical results, it has revealed particularly difficult to imple- 
ment when applied to dilute systems. It is then desirable to develop other analytical 
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methods to treat these problems, at least in their simplest phase. In this article we 
investigate an independent analytic approach that is based on a cluster expansion. 
It allows one to compute several "additive" quantities of interest in dilute systems 
such as the energy density, the entropy, the free-energy, etc. We shall apply this 
tool to the study of two standard problems, with definitions recalled below, that 
are random K-satisfiability (K-sat) |T(J and the Viana-Bray dilute spin-glass 0. 



The method is similar to one of the techniques used by Weigt and Hartmann JTlJ 
in their study of the vertex-cover problem on a random graph. The application 
to other dilute systems is straightforward. Some of the advantages of this method 
with respect to replicas are: it allows one to compute the corrections to the ther- 
modynamic limit in a simple way; it allows one to pinpoint a possible limitation of 
the replica symmetric (rs) Ansatz in the satisfiable and paramagnetic (pm) phases 
of dilute disordered systems and, not less importantly, it can be straightforwardly 
adapted to study the evolution of some of the algorithms developed to analyze K-sat 
numerically |L2] . 



The paper is organized as follows. In Section g we define the random K-sat 
problem and recall its main properties. In Section |3| we define the clusters, as well 
as several useful notions associated to them, and we introduce the cluster expansion. 
Section ^ is devoted to the explicit calculation of the entropy of K-sat. The finite 
N corrections are also described. In Section || we discuss the interplay between 
the percolation and easy-to-hard transition. We underline the consequences of this 
calculation as regards to the validity of the RS Ansatz in the satisfiable and PM 
phases. As an application to a physical system, we discuss the calculation of the 
paramagnetic PM free-energy of the Viana-Bray || dilute spin-glass in Section []. 
Finally, in Section [7] we present our conclusions. 



2 K-satisfiability 

The theory of complexity has been developed to characterize worst-case instances 
of hard computational problems |13|]. A classification scheme, according to the time 
needed to find solutions with the best performing algorithms, or to prove that a 
problem is not solvable, is one of the outcomes of these studies. Of particular 
importance is the problem of K-satisfiability ||T0| , |14 , [15| (K-sat) that has been used 
as a testing ground for these theories. 

However, it has been recently realized that in many interesting cases in computer 
science, it is more relevant to determine the properties of typical, and not worst, 



realizations of a given problem |fL6|j . Random K-sat, defined as the ensemble of 
randomly generated instances of K-sat, is the paradigm and the goal is now to 
predict the behavior of a typical element of the ensemble. 

The relation between phase transitions, or threshold phenomena, and intractabil- 
ity in random combinatorial problems has been stressed by several authors ||17|| . 
Problems that are very hard to solve in the worst not so in the typical 

case, unless the control parameter takes values within a finite interval that defines 
the critical region. Away from the critical region, simple algorithms are capable of 
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finding a solution, or showing that there is no solution, in polynomial time. Random 
K-sat has a well-defined threshold phenomenon. 

Random K-sat, as well as other random optimization problems, can be mapped 
onto disordered spin models. The mapping is done by associating the cost func- 
tion in the optimization problem to an energy density in the physical system ||. 
Consequently, the random character associated to the choice of different instances 
in the optimization problem translates into quenched disordered interactions in the 
physical system. The most interesting optimization problems like K-sat become 
spin-glass models of a particularly difficult type, where each spin interacts with a 
finite fraction of other spins in the sample. These are called dilute spin-glasses and 
they are interesting per se since they appear as a case of intermediate difficulty 
between solvable mean-field spin-glasses and realistic finite dimensional ones. 

The quest of the threshold value of the control parameter becomes then a search 
for a phase transition. Thus, all tools developed to treat disordered physical systems 



in statistical mechanics |T8| can be adapted to study random optimization problems. 
In the context of random K-sat two main techniques have been used so far: the 
replica approach in the thermodynamic limit [pi BL O, E0[ and numerical simulations 



complemented by finite size scaling when the number of variables remains finite pi 
The same two techniques are used in the study of dilute spin-glasses. 

Random K-sat is defined as follows. Consider N Boolean variables, {xi, . . . , x^}, 
that can take two logical values Xj = TRUE or FALSE, for each %. Firstly, choose K 
indices from the set of N elements, i — 1, . . . , N. Secondly, assign to each of these 
indices the literal Xj, or its negation Xj, with equal probability p — 1/2. Thirdly, 
construct a clause C\ as the logical OR (V) of the K previously determined literals. 
If K= 3 and iV = 10 a possible clause is x\ V x~l V x 7 . New clauses are generated in 
identical manner, independently of the previous ones. One usually calls M the total 
number of clauses. A formula F is the logical AND (A) of M such clauses. It reads 

m / k \ 

i=i \i=i / 

where z\ G {xi, x~i, ... , x^, x^}. A solution, if it exists, is an assignment of the N 
variables that satisfies F, that is to say, for which all clauses are verified simultane- 
ously. 

Note that in the process of generation of a clause, two random processes inter- 
vene. In the first one, one selects the variables, in the second one, once the variables 
have been chosen, one determines the requirements that will be imposed on them. 
We shall later take advantage of this two-step process to perform the average over 
disorder in a convenient order. 

It is clear that if M <C iV it will be very easy to find a solution to F. On 
the contrary, if M 3> N, it will be extremely difficult to satisfy all requirements 
simultaneously. Indeed, a well-defined critical value a c (K) of the parameter a = 
M/N appears when M — > oo and N — > oo with their ratio a kept fixed. This 
limit corresponds to a thermodynamic limit, in the physical language. A threshold 
phenomenon, reminiscent of a phase transition, is observed: for a < a c (K) all 
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formulae have at least one solution with probability one, whereas for a > a c (K) any 
formula has no solution with probability one. 

Different values of K lead to different critical behaviors. When K = 1, the model 
is unsatisfiable for all finite values of a, i.e. a c (K = 1) = 0. When K = 2, K-sat has 
a continuous phase transition at a c (K = 2) = 1. This is a rigorous result proven 



by using a mapping on a directed graph problem fl4| . For K > 3 only numerical 
estimates for a c (K > 3) and approximate results obtained with the replica method 
are available ||, these yield a c (K = 3) ~ 4.2. 

The replica method is a powerful tool of statistical mechanics that allows one to 
compute the statistical properties of a disordered physical problem in equilibrium 
with a thermal environment. In order to use it to study optimization problems in 
general, and K-sat in particular, one first maps the optimization problem onto a 
statistical mechanics model. In the case of K-sat, the physical model is a spin-glass 
model with dilute interactions of random sign. Indeed, a natural representation of 
any K-sat formula is obtained by introducing an M x N matrix Cu, whose elements 
are 

if neither Xi nor Xi GQ, 
Cn = { 1 if XiGQ, (2) 

-1 if Ti e Ci . 

The random generation of clauses is equivalent to a uniform distribution of the 
matrices Cu that satisfy the constraints Y^i Cu = K, V/. 

A cost function for K-sat is given by the number of unsatisfied clauses in a given 
formula. If one identifies the logical state Xi = TRUE with a spin Si = 1 and the 
logical state Xi = FALSE with a spin Si = —1, it is then easy to verify that the 
following expression counts the number of unsatisfied clauses 

m / n \ 
E[{C U , x % }} = J2 £ (K) E CuSt, -K , (3) 

1=1 \i=l I 

where is the Kronecker delta function. Using a polynomial representation 

of 5^ this expression can be rewritten as the total energy of a sum of dilute p spin- 
glass models in a random field (several values of p intervene, how many depends on 
the value of K) [§. 

Once the energy function is identified, one introduces a Active temperature T, 
then computes the average free-energy with the help of the replica trick, and finally 
takes the limit T — > to study the ground state properties of the physical model. 
This gives access to quantities such as, for example, the average entropy of the 
satisfiable phase. This is defined as the average over disorder of the logarithm of the 
number of solutions. One of the drawbacks of the use of the replica method is that 
an Ansatz is necessary to pursue the calculation. Even in the simplest phases, the 
satisfiable one for K-sat, it is not obvious to show that the simplest Ansatz, called 
replica symmetric (rs), solves the problem exactly. Moreover, it has been proven 
that in the unsatisfiable phase one has to go beyond the RS Ansatz and develop a 
replica symmetry breaking (rsb) scheme. This is indeed a very difficult task since 
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the order parameters for dilute systems have a much more intricate structure than 
for infinitely connected cases |5], |7|, ^3] . Recent progress in this direction has been 
presented in Ref. ||. 

In this paper we re-derive a generic expression for the entropy of K-sat using a 
very simple method that avoids the use of replicas. Furthermore, the method allows 
us to compute the finite-size corrections. Our derivation gives information about 
the domain of validity of Monasson and Zecchina's conjecture that the RS Ansatz 
is exact in the satisfiable phase. We explain the expansion using the formalism of 
K-sat but the line of reasoning can be applied to any dilute system in the dilute 
regime. In Section ^| we shall analyze the Viana-Bray model along the same lines. 



3 The method 

Let us start this Section by setting the notation and defining a set of notions that 
will be used later. 

Given a formula F of K-sat, two variables Xi and Xj are called adjacent if there 
is at least one clause in F in which both x% and Xj appear, irrespectively of the fact 
that they are negated or not. Two variables are connected if and only if there is a 
path of adjacent variables between them. A cluster is a set of connected variables 
that are disconnected from all others. Let us label with an integer r the different 
clusters of the formula F, r = 1, . . . ,J\f c (F), where M C {F) is the total number of 
clusters in F. We shall call no{F) the number of variables that do not belong to any 
cluster. 

These definitions are very easy to picture. For instance, take a 3-sat problem 
with ten variables, i — 1, . . . , 10, that is defined by the formula F = (x\ V x~2 V x^) A 
(^3 V14V X5) A (xq V17 V Xg). The variables i = 9 and % = 10 do not belong to any 
cluster, thus n (F) = 2. A graphical representation of each clause is very useful. 
We associate a point to each variable. Each clause is represented by a star with K 
legs, 3 in the example, with endpoints that represent the variables. In the formula 
F there are two clusters, M c = 2, that link i — 1, ... ,5 and i = 6,7, 8, respectively. 
When a variable appears in two (or more) clauses it will be shared by two stars. 
This is the case in the cluster on the left of Fig. [j]. More complicated structures are 
possible, particularly when N and M are large. The assignment Xi (xj) of the ith 
literal in a clause can be represented with a plus (minus) sign on its leg. These are 
the signs in Fig. |l[ In this way, a one-to-one correspondence between formula? and 
graphs is constructed. 

When a is small, the typical cluster size is expected to be small as well, as 
there are much less clauses than variables. Indeed, for K = 2 this problem is 
the one of percolation in an infinite dimensional space, also known as the random 



graph. Many properties of this model are known ||25|| , among which the fact that for 
a < 1/2 all variables belong to clusters of size at most proportional to IniV, in the 
thermodynamic limit. When a crosses 1/2 a giant cluster containing a finite fraction 
of the variables grows continuously. For K > 3 the equivalent geometrical problem 
relies on the theory of hyper-graphs, for which less is known. The percolation occurs 
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Figure 1: A graphical representation of the formula F = (x\ V x<i V £3) A (£3 V £4 V 
£5) A (^6 Viy V £g). See the text for details. 



at a = a p = 1/(K(K — 1)) [p6[ . In these two cases the system percolates much before 
it becomes unsatisfiable, a p < a c . Indeed the appearance of a contradiction requires 
an intricate structure in the giant cluster. 

Let us define the ground state entropy of a formula S GS (F) as the logarithm of 
the number of assignments of the variables that minimize the number of violated 
clauses. If F is satisfiable, S GS (F) is the logarithm of the number of solutions of F. 
It is clear from the clusters' definition that S GS is the sum of contributions of the 
different independent sub-formulae: 

Afc(F) 

S GS (F) = n (F) In 2 + £ S GS (F r ) . (4) 

r=l 

We are interested in the entropy averaged over the ensemble of formulas, S GS . We 
shall henceforth denote ensemble averages with an over-bar. As stressed in Section ^| 
this average is twofold. Indeed, clusters can be separated into ensembles with the 
same topology, ignoring for the moment the sign assignment of the literals. Thus, 
the averaging proceeds in two steps; one first chooses the topology of the cluster, 
with its associated probability, and then one averages over the two possibilities for 
each literal in the cluster. For a given cluster, once the latter average is performed, 
the entropy depends only on the topology of the cluster. This remark allows us 
to rewrite the average of the sum in Eq. ([|) in a more convenient manner. If we 
introduce a new integer t that labels all possible topologies, and n t (F) and (S t ) the 
number of t-like clusters in formula F and the average over the sign assignement of 
the entropy of the t-like clusters, respectively, we arrive at the following expression 
for the averaged entropy: 

S£ = £[n*KS>. (5) 

t 

We have here included the isolated variables in the sum, associating them to the 
index t = 0, S Q = In 2 and we denote with [n t ] the average number of t-like clusters. 

A more convenient expression for [nt] can now be worked out. Let us call X\{F) 
the function which takes the value 1 if the variable i belongs to a t-like cluster of 
the formula F and otherwise; let L t be the number of variables in such a cluster. 
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Then 

MF) = ^-Y, x t( F )> ( 6 ) 

* i 

that implies 

= (7, 

where Pt = [X/] is the probability that a given variable belong to a t-like cluster. 
Finally, 

^=E^)- (8) 

Pt and (St) can now be obtained using elementary combinatorial arguments and 
simple enumeration. 

This formulation can be adapted to any quantity for which the clusters contribute 
additively, the free-energy for instance, and to other dilute problems where there is 
also a decoupling in the randomness between a geometrical part and an interaction 
one, as in the Viana-Bray model H. 



4 Cluster expansion of the K-SAT entropy 

In this Section we shall apply the cluster expansion to the calculation of the average 
entropy of random K-sat. For our present purposes K= 1-sat is not interesting since 
it is unsatisfiable for all finite values of a. We shall then start by analyzing in detail 
K= 2-sat. Afterwards, we shall discuss how the approach generalizes to larger values 
of K. 

4.1 K= 2-sat in the thermodynamic limit 

For a cluster of n variables connected by p distinct clauses, the probability P t reads 



P 




(N - n)(N - n - 1) \ M ~ p 
N(N-1) J 



(9) 



Let us briefly describe the origin of the factors in this equation. Each of the p clauses 
is chosen with probability 2/(N(N — 1)) at each of the M steps in the formula 
generation process. For the variables belonging to the cluster to be disconnected 
from all other sites, the M — p other clauses must belong to the set of the (N — 
n)(N — n — l)/2 clauses connecting the other sites. The first two factors come from 
the possible permutation of the p steps where the considered clauses appear. The 
last three factors arise from the freedom in the choice of n — 1 sites connected to 
the chosen site. In particular, K t is a symmetry factor that equals the number of 
distinct labellings of the n sites, divided by (n— 1)!. Note that two labellings which 
lead to the same set of clauses are not distinct: for the linear three site cluster 
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Figure 2: Tree-like clusters that contribute to K = 2-sat. 



type 


L t 


K t 


(St) 


a 


1 


1 


In 2 


b 


2 


1 


In 3 


c 


3 


3/2 


(21n2 + ln5)/2 


d 


4 


2 


(31n2 + ln5 + 21n 7)/4 


e 


4 


2/3 


(31n2 + 51n3)/4 


f 


5 


5/2 


(4 In 2 + 6 In 3 + In 5 + 2 In 1 1 + In 13) /8 


g 


5 


5/2 


(9 In 2 + 2 In 5 + 2 In 7 + In 11 + In 13)/8 


h 


5 


5/24 


(131n2 + 41n5 + lnl7)/8 



Table 1: The contributions of the clusters in Fig. ^| 



1 — 2 — 3 and 3 — 2 — 1 correspond to the same labelling, with clauses (12) and (23). 
But 1 — 2 — 3 and 2 — 1 — 3 are distinct. 

In the thermodynamic limit N —>■ oo and a fixed, and for n and p finite, this 
expression is proportional to N n ^ 1 ^ p (p > n — 1). It is then finite only if p — n — 1, 
that is to say for tree-like clusters. This justifies the choice of distinct clauses. In 
this limit, for p = L t — 1 and n — Lt, this expression simplifies greatly: 

P t = (2a) Lt - 1 e- 2Lta K t . (10) 

The different clusters considered in the expansion are represented in Fig. |2|. For each 
type, the relevant quantities, obtained by basic enumeration, are given in Table 0. 
For instance, the average entropy of the linear three sites cluster is made of two 
parts: if the clauses require the same sign for the central variable, one can find five 
solutions of the formula; if the clauses are contradictory for the central variable, 
there are only four solutions. 
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Expanding in a up to 0(a 4 ) we obtain 



N Sgs ~ 

/3\ 9 /80\ a 3 , / 3 29 7 6 \ a 4 / 2 225 5 160 ll 36 1 3 24 1 7 s 
In 2 + a In - + a 2 In — M In 



47 V81/ 3 \ 5 15 2 28 / 12 \ 32167168 




11) 



Monasson and Zecchina obtained this series by using the replica trick, with a RS 
Ansatz, to average the free-energy of the physical model related to 2-sat ||. The 
averaged entropy follows from the averaged free-energy density that itself depends 
on the probability distribution of the local fields. This quantity is determined by 
an integro-differential equation that cannot be solved analytically. Monasson and 
Zecchina developed a perturbative solution in a that allowed them to derive a series 
for S SG /N that coincides, up to 0(a 4 ), with the one in Eq. ([□]). The perturba- 
tive nature of this result is now clarified from the cluster analysis. Note that we 
performed two expansions to obtain this series: the cluster enumeration and an ex- 
pansion in powers of a of the exponentials in P t . We shall further discuss this issue 
in Section |5|. 



4.2 Finite size corrections to the entropy of K= 2-sat 



There are two kinds of finite size corrections to the expansion presented in Eq. (|TTD . 
On the one hand, the probability P t of a variable belonging to a tree-like cluster has 
1/N corrections that can be simply computed from the general expression (^|). On 
the other hand, clusters that include loops also contribute to the 1/N corrections. 

The expansion of expression @ up to order 1/N for tree-like clusters with n = L t 
and p = L t — 1 yields 



Pt 




aL t {L t + 1) 



{L t -l){L t -2) 



1 

1 + - 

a 



'12) 



Pt 



— {2a) Lt e- 2Lta K t , 



(13) 



Clusters with / loops contribute to the order 1/N 1 . Hence, if we only wish to 
compute the 1/N corrections we can content ourselves with clusters that have only 
one loop. These have L t variables and also L t clauses. One obtains 

1 

N' 

with K t defined as before and multiplied by 1/2 if there is a repeated clause. The 
one-loop clusters that we considered are represented in Fig. |3|. 

Including the 1/N corrections in Eq. (|l^) and the ones stemming from the new 
diagrams in Fig. |3| and Eq. (|T^) calculated with the results of Table 0, the correction 
to S GS /N reads 



1 

N 



3 4 \ 



Q- 



at In — — H In 

V 25 / 4 




1937156 



31937 



2199 5 141 11 36 13 24 1 7 



(14) 
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Figure 3: Loop diagrams contributing to the 1/N corrections. 



type 


L t 


K t 


(St) 


a' 


2 


1/2 


(31n2 + ln3)/4 


b' 


3 


1/2 


(91n2 + 31n3)/8 


c' 


3 


3/2 


(51n2 + 41n3 + ln5)/8 



Table 2: The contributions of the clusters represented in Fig. || 

This result has to be checked against exhaustive numerical evaluation of the entropy 
for small systems. 

4.3 K > 3-sat in the thermodynamic limit 

The method described in detail for K = 2 can be used for any value of K. As 
the graphical representation and the enumeration of clusters are, however, more 
cumbersome than for K = 2 we shall present less detailed results for the case K > 3. 

The probability for a given variable to be present in a cluster of L t variables that 
are linked by p clauses is of order 1 in the thermodynamic limit only if p(K — 1) = 
L t — 1, which is the tree-like condition for these hyper-graphs. If this holds 

P t = (aK\) p e- LtaK K t . (15) 

In Fig. [| we have drawn the diagrams leading to the main contributions for K = 3. 
In the text we give the analytic expression for general K. 
With these values we obtain 

^ = ln2 + aln(l--L) 

2 K ~ 1 -1 \1 
1 " 2*-i(2*-l) )\ - (16) 

Again we recover the RS result of Ref. 0. The contributions to the finite size 
corrections are similar to the ones discussed for 2-sat; we obtain at the leading order 
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Figure 4: Clusters yielding the leading contributions to K = 3-sat. 



type 


L t 


K t 


(St) 


a" 


1 


1 


In 2 


b" 


K 


K/K\ 


ln(2 K - 1) 


c" 


2K-1 


K'\2K- 1)/(2(K\) 2 ) 


i ln(l + 2 K (2 K - X - 1)) + i ln(2 K (2*- 1 - 1)) 



Table 3: The contributions of the clusters represented in Fig. |j. 

in a: 

2^-1(2^-1) )\ ' ^ 

We have shown that the perturbative analysis of the RS Ansatz leads to a series 
in a which leading orders coincide with the ones stemming from the finite cluster 
expansion, for all K. In the next Section we shall discuss the relevance of the contri- 
butions from the infinite cluster that appears at the percolation transition a p < a c . 



1 aK 2 



In 1 



2 K ) 2 




5 Discussion 

The domain of validity of our expansion, and of the results obtained with the replica 
method 0, can be enlighted by studying the percolation phenomenon in detail. 

The series expansion in (§) is ordered following the index t that is directly related 
to the size of the clusters. The dependence on a is involved since the coefficients P t 
are a-dependent via an exponential times a power. The expansion of the exponential 
factors in powers of a leads to a rearrangement of the series in powers of a. 

The range of validity of both series is not obvious. We can start by analysing the 
simpler series J2t Pt that should count the total fraction of sites and be identical to 1. 
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For K= 2, its direct summation yields 1 for a < a p and 1 — P for a > a p , where P is 
the solution to 1 — P = e~ 2aP (see Eq. (|i~8"D below). Thus, it fails above a p because 
of the emergence of a giant cluster at the percolation transition. Instead, if one 
expands the exponentials in powers of a, the result is J2t Pt — l + 0xa + 0xa 2 + .... 
The sum yields 1 independently of a even beyond the percolation threshold. The 
rearragement in powers of a captures the correct behavior of this quantity. 

One can conjecture that the rearrangement yields the exact result for all a for 
all quantities that depend mainly on the locally tree-like structure of the percolating 
cluster, and only weakly on its loops, which show up only on a scale of order lniV. 
As can be seen in Eq. (9), the exponentials in P t arise from the requirement that 
the considered cluster is disconnected from the rest of the sites. Expanding the 
exponentials amounts basically to assuming that a sub-tree of the giant cluster 
gives the same contribution as its disconnected counterpart. The alternative signs 
arise from the need to avoid the double counting of the smaller clusters contained 
in the giant cluster. 

As regards to the calculation of the averaged entropy, if one could compute all 
the terms in t^) and sum the series, the result would be exact under the percola- 
tion threshold, a p = l/(K(K — 1)). Indeed all sites belong to clusters of size at 
most proportional to lniV in this regime. As soon as a goes beyond a p , the direct 
summation of @ should fail. 

In spite of the discussion of the next to last paragraph, we do not expect that 
reordering the series in powers of a gives the correct result for the entropy. Our 
argument is based on the drastic influence of loops on this quantity. Let us compare 
the entropy of a loop and of a linear cluster of the same size, for 2-sat. There are 
roughly twice as many solutions for the linear cluster as for the loop, as one does 
not require the ending variables to be the same. The difference of entropy between 
the two should then be finite. As there are an extensive number of loops in the 
percolating cluster, one can expect a finite deviation in the average entropy per 
site between the result assuming a tree-like structure of the giant cluster (i.e. the 
expansion in alpha of the original series) and the correct one. 

We have examined these issues with the help of numerical simulations of systems 
with small sizes. 

Firstly, we compare the averaged total entropy per degree of freedom, S/N, 
to the value predicted by the series expansion once reordered in powers of a. In 
Fig. |5] we plot S/N against 1/N with linespoints + for K=2-sat problems with N = 
16(50000), 20(50000), 24(50000), 28(10000), 32(15000), 36(10000) and a p < a = 
0.75 < a c . For each sample we computed the entropy by exhaustive enumeration. 
The numbers between parenthesis are the number of realizations of random instances 
of K-sat used to compute the averages. The accord with the analytical prediction of 
the truncated series expansion in the thermodynamic limit (horizontal line below) 
and including 1/N corrections (tilted line above) is very good within 0.3%. We have 
also computed the variance, 1/N{S 2 - S), and checked that it is in good accord 
with the analytical result, 0.025a 2 , that we obtained with an extension of the cluster 
expansion described in previous sections. 

Even if the accord between numerical results and theory is almost perfect for 
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Figure 5: The averaged entropy per site for K = 2 and a = 0.75. The sizes 
are N = 16, 20, 24, 28, 32, 36. The lower straight line indicates the value given by 
the truncated series expansion found with the replica method in the thermodynamic 
limit (upto 0(a 8 )) and the upper curve with linepoints includes the l/N corrections, 
for each value of N. 



these small sizes, a more careful inspection of the different contributions to the total 
entropy shows that one could expect important deviations for increasing N. We have 
computed separately the contribution of the largest cluster. Figure |6| represents its 
study. We here plot, with crosses, the averaged entropy of the largest cluster, per 
degree of freedom, S MAX /N. Its contribution, even for these small sizes, represents 
roughly a half of the total entropy (note that below the percolation threshold the 
contribution of the largest cluster is much smaller) and that does not seem to vanish 
in the thermodynamic limit. A simple linear fit of S MAX /N yields, when N — > oo, 
the finite limit 0.1906 ± 0.0008. This may be taken as a guess for a lower limit for 
the contribution of the largest cluster. 

In the same figure we compare the averaged entropy of the largest cluster S MAX /N 
to the "factorized" quantity L MAX /N S MAX / L MAX , where L MAX is the number of vari- 
ables in the largest cluster, that is represented with squares. For finite sizes we have 
shown that these two quantities coincide. The good accord between the two curves 
suggests that the factorization also holds when N — > oo. 

This observation suggests an improvement of the finite-size numerical study. If 
we assume that the factorization holds in the limit iV — > oo for the percolating 
cluster, we can then replace the fraction of sites that belong to the largest cluster, 
-Pmax = L MAX /N, by its analytical value in the thermodynamic limit. This is given 



This result is obtained using a self-consistent equation on the generating function 



by 



1 - Puax = exp (aK ((1 - P mAX ) K ' 1 - l)) 



(18) 
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that counts the number of sites in finite size clusters ||29|| . For K = 2 and a = 0.75, 
Pmax ~ 0.5828. Figure |^ displays Pmax &s a function of a for K = 2 and different 
system sizes, iV = 25,100,1000,10000. One sees that the percolation transition is 
reached only for a too large size, N ~ 10000 that is far beyond the largest sample 
for which one can compute the entropy by exhaustive enumeration. Moreover, it 
is important to notice that the approach to the asymptotic value is nonmonotonic 
since, for these values of N the approach to the asympote comes from below while 
one can easily prove that P M ax(N = 4, a = 0.75) ~ 0.75. 

Finally, the third curve in Fig. |] (stars) represents the "improved" contribution of 
the largest cluster, 0.5828 S MAX / L MAX . For iV > 20 the improved curve is still higher 
than the actual one and a linear fit yields the limiting value limjv_ j . 00 S MAX /N ~ 
0.2143 ±0.0003. 

This numerical analysis, even if based on simulations of very small systems and 
highly speculative, suggests that the contribution of the percolating cluster is finite 
in the thermodyamic limit. On the analytical side, it might be possible to compute 
the entropy of the percolating cluster, at least for 2-sat, by taking profit of the 
numerous mathematical results on the structure of random graphs p5| . Such a 
study is necessary to settle the problem. 

This picture could also explain the discrepancy between numerical studies and 
a rigorous bound for the value of the exponent governing the size of the scaling 
window of the satisfiability transition PU|, . For the relatively small system sizes 



first studied, the observed exponent may have been altered by percolation, as its 
asymptotic regime was not reached yet ||19|| . At larger sizes, and in mathematical 
studies, the exponent is purely due to the satisfiability threshold inside the giant 
cluster; one might observe the true exponent for lower system sizes by studying the 
probability of the largest cluster to be satisfiable. 

Let us summarize briefly the route followed in studies based on the replica trick 
to attempt to identify the effect of the percolation transition within this calcula- 
tion || . In the RS Ansatz, an intricate integral equation over the probability of local 
fields, P{h), that is the order parameter of the system, is obtained. From P(h) all 
thermodynamic quantities follow, including the entropy of the satisfiable phase. As 
the analytical resolution of the equation that determines P{h) seems out of reach, 
it has been solved order by order in a. This yields an a expansion for the entropy 
which is in exact coincidence with ours up to 0(a 4 ) for K = 2 and up to 0(a 2 ) for 
K > 3. It seems natural to assume that the two expansions coincide to all orders. 
The RS Ansatz is thus proven to be exact for all a such that a < a p = 1/(K(K—1)). 
Beyond this value if, as we discussed above, the influence of the loops on the en- 
tropy of the percolating cluster is not negligible, two possibilities arise: either the 
RS Ansatz is wrong, or a more refined handling of the integral equation on P(h) 
is required |32|. A careful analysis of this problem is worthwhile. One could, for 



instance, investigate the presence of a singularity at the percolation threshold. In 
any case, the fact that the entropy of the satisfiable phase remains finite up to the 
satisfiability transition is confirmed: at the threshold a finite, even if small (~ 0.2 
for K = 2), fraction of sites are in finite size clusters, and their contribution provides 
a lower bound for the entropy of the system. 
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Figure 6: The averaged entropy of the maximum cluster per variable, S MAX /N, 
the factorized average S MAX / L MAX L MAX /N and the semi-analytical prediction 
PmaxSmax/ L UAX as a function of 1/N for a = 0.75 and K = 2. 




Figure 7: K — 2. The fraction of sites belonging to the largest cluster. The 
asymptotic limit N — ► oo is approximately reached for N ~ 10000 where L MAX /N 
is "finite" only above the percolation threshold a p (K = 2) = 0.5. The analytical 
expression is given by the solution to Eq. (0) and it fits the data very accurately 
for a > 0.6 as soon as N > 1000. We have verified that all other variables are 
distributed among finite clusters. 
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6 Dilute spin glasses: an exact solution for the 
paramagnetic phase of the Viana Bray model 

Spin glasses are magnetic systems where the interactions between degrees of freedom 
(spins) are disordered. The key quantity that determines all the statistical properties 
of such systems is the free energy averaged over the distribution of the interactions. 



In most cases this average can only be computed with the replica trick UlSfl , often 
involving the technically subtle replica symmetry breaking (rsb). For infinitely 
connected models, in which each spin interacts with all others in the sample, like the 
Sherrington-Kirkpatrick model |]28|| , the RSB scheme that yields the exact solution 
in both high and low-temperature phases is well understood. The final aim, far 
from being reached, is to determine the nature of the spin-glass phase of disordered 
models on a <i-dimensional lattice with only short range interactions. Dilute spin 
glasses can be viewed as an intermediate step between these two limits: each spin 
interacts with a finite number of other spins but the model includes no notion of 
distance since the "neighbors" are randomly chosen from the whole set of spins of 
the system. 

The standard dilute spin-glass model has been introduced by Viana and Bray 0. 
It is defined by the following Hamiltonian involving N classical Ising spins, S{ = ±1, 

// £ J :, S > S J ■ ( 19 ) 

The interactions J^- are independently distributed with the same probability law: 

P(Jij) = ( 1 ~^) ^) + ^Pi J a) ■ ( 20 ) 

p is normalized to one, and has average and mean square deviation of order one to 
obtain a sensible thermodynamic limit, c is the mean connectivity per spin. 

Despite numerous studies || [| |7j, a complete understanding of this model has 
not been reached yet. The main difficulty in the study of dilute models is that even 
without RSB, the order parameter is a function instead of a number as in the IC 
case. In order to introduce RSB one has to cope with an order parameter which is 
at least a functional, leading to very difficult calculations. 

The mean connectivity per spin c plays here the same role as a in our expansion 
of the 2-sat entropy. As opposed to K-sat, for the Viana-Bray model the percolation 
and the paramagnetic (pm) to spin-glass transition occur at the same critical value 
c = 1. In the dilute phase, c < 1, the clusters have not percolated and the model 
is in the PM phase |p3f . The statistical properties in this phase can be studied with 
the cluster expansion. The average free-energy per site reads 

- Pf = E ^- X e~ Uc K t WZ t , (21) 
t L t 

where the over-line denotes an average with respect to p and only tree clusters 
contribute in the thermodynamic limit. This is the analog of Eq. (Rl) with a slightly 
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different expression for P t where 2a is replaced by c. Z t is the partition function of 
the cluster. One can easily prove that for trees, 



\nZ t = L t \n2 + (L t - 1) In cosh /3 J , (22) 

irrespectively of the topology. Then the sum of symmetry factors for all trees of L t 
sites is Lf" _2 /(L( — 1)! (a well-known result from graph theory f24jl), and results in 

-/5/ = ln2 ^^(c^- 1 +lncosh/3J c^^(fc-l)(cA;) fc - 2 . (23) 

\fc=l ' / \ k=l ' / 

The two sums can be evaluated (cf Appendix) to yield 

-Pf = ln2 + | lncosh/5J . (24) 

This result has a simple interpretation: from Eq. (|22"D, each site contributes with 
In 2 to the free energy, each link with In cosh (3 J for tree-like sructures. As there are 
N sites and cN/2 links on average, the result follows. One obtains exactly the same 
free energy following the replica method with a RS Ansatz. The exactness of the RS 
Ansatz in this phase is then proven. 

The cluster expansion allows to compute finite size corrections in a rather simple 
manner. These read, upto 0(c 3 /N), 

1 

N 



■— In cosh P J + c [ — In 2 + In cosh/? J H — ln(l + tanh/3Ji tanh/3J 2 tanh/?J 3 



where J\, J 2 and J3 are three independent couplings taken from the probability 
distribution p{Jij). It will be very interesting to confront this result with the finite 
size corrections to the replica calculation using the RS Ansatz. 

As in the K-sat problem, the expansion cannot be used beyond c = 1, since it 
does not take into account the giant cluster appearing at the percolation transition. 



7 Conclusions and perspectives 

The cluster expansion relies on very simple combinatorial arguments. It allows one 
to solve optimization problems in the "easy" phase avoiding the introduction of 
replicas and it is a general method to obtain finite N corrections. Most importantly, 
it has allowed us to signal the possible need for a revision of the replica solution of 
the K-sat, and similar problems, with two successive percolation and easy-to-hard 
transition. 

For spin-glasses without a difference in these two transitions the interest of the 
method is less apparent, as highly diluted systems are in the less interesting PM 
phase. Still, it would be interesting to test if the RS Ansatz is exact for any spin- 
glass model under its percolation threshold, as was proven here for the Viana-Bray 
spin-glass in the thermodynamic limit. 

Let us note, however, the difference in the percolating and critical behavior of 
K-sat and the Viana-Bray dilute spin-glass. In the former, percolation occurs before 
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the satisfiability transition (a p < a c ); in the latter both phenomena arise at the same 
value of the control parameter (c = 1). This can be understood as being due to the 
fact that 2-sat is, in a way, less frustrated than VB. Two manifestations of this fact 
are given by the behavior of a single loop and a linear cluster. A single loop in 2-sat 
is always satisfiable while in VB it is frustrated each time there is an odd number 
of antiferromagnetic couplings on it. A linear cluster in 2-sat can be satisfied by a 
large number of configurations while in VB only two spin configurations satisfy all 
bonds. 

The cluster expansion can be applied to a variety of interesting problems. For 
instance, algorithms that solve satisfiability problems through local search, like walk- 
sat can also be studied with this method |12| . The number of steps needed 
to solve a formula is the sum of the number of steps needed to solve each cluster. 
Improvements of the algorithms by means of better heuristics can thus be quanti- 
fied u. 
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Appendix 

In this appendix we shall derive a proof of the two summations used in eq. (f2~3|): 
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(25) 
(26) 



The proof relies on a mathematical identity proven with the help of analytical 
function tools |33]]. Let w(z) be a given function which can be inverted to z{w). 
Then the coefficients of the serie expansion of z(w) are obtained via the following 
expression: 



z{w) 



bk 



E y bkUjk 
k=i K - 



d 



fc-i 



dt k ~ l 
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At), 



(27) 
(28) 



t=o 



Let us consider the function w(z) = zexp(-cz), which is a bijection from [0, 1] 
to [0, e~ c ] if c < 1. The coefficients of the series expansion of the reciprocal z(w) are 
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dt k ~ l 
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(29) 



J t=Q 
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Thus, A = z{e~ c ) = 1, as w(l) = e~ c \ 

The second series can be transformed using A = 1 : 



If we define 



1 _ p -c oo ( p -c\k 

B= l —^-^^{ckr\ (30) 



oo k 

g{w) = Y.iJ^) k -\ (31) 

k=2 K - 



then B = (1 — e~ c )/c — g(e~ c ). To compute g(u>), we note that g(0) = and 
g'{w) = z(w)/(cw) — 1/c. By integration and with the change of variables z = z(w), 
one obtains 

9(e- c ) = -— + -[ 1 dz w\z)^- = *-£l - \ . (32) 
c c Jo w(z) c 2 

This yields the final result B = 1/2. 
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